A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions

نویسندگان

  • Jinyu Li
  • Li Deng
  • Dong Yu
  • Yifan Gong
  • Alex Acero
چکیده

In this paper, we present our recent development of a model-domain environment-robust adaptation algorithm, which demonstrates high performance in the standard Aurora 2 speech recognition task. The algorithm consists of two main steps. First, the noise and channel parameters are estimated using multi-sources of information including a nonlinear environment distortion model in the cepstral domain, the posterior probabilities of all the Gaussians in speech recognizer, and truncated vector-Taylor-series (VTS) approximation. Second, the estimated noise and channel parameters are used to adapt the static and dynamic portions (delta and delta-delta) of the HMM means and variances. This two-step algorithm enables joint compensation of both additive and convolutive distortions (JAC). The hallmark of our new approach is the use of a nonlinear, phase-sensitive model of acoustic distortion that captures phase asynchrony between clean speech and the mixing noise. In the experimental evaluation using the standard Aurora2 task, the proposed Phase-JAC/VTS algorithm achieves 93.32% word accuracy using the clean-trained complex HMM backend as the baseline system for the unsupervised model adaptation. This represents high recognition performance on this task without discriminative training of the HMM system. The experimental results show that the phase term, which was missing in all previous HMM-adaptation work, contributes significantly to the achieved high recognition accuracy. Keywords— phase-sensitive distortion model, vector Taylor series, joint compensation, additive and convolutive distortions, robust ASR

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unscented transform with online distortion estimation for HMM adaptation

In this paper, we propose to improve our previously developed method for joint compensation of additive and convolutive distortions (JAC) applied to model adaptation. The improvement entails replacing the vector Taylor series (VTS) approximation with unscented transform (UT) in formulating both the static and dynamic model parameter adaptation. Our new JAC-UT method differentiates itself from o...

متن کامل

Towards High-Accuracy Low-Cost Noisy Robust Speech Recognition Exploiting Structured Model

 It is well known that the distorted speech can be considered generated from the clean speech with the additive noise and the convolutive channel as In this paper, we present our recent study on using this structured model of physical distortion for robust automatic speech recognition. Three methods are introduced for joint compensation of additive and convolutive distortions (JAC), with diffe...

متن کامل

A novel use of residual noise model for modified PMC

In this paper, a new approach based on model adaptation is proposed for acoustic mismatch problem. A specific bias model—residual noise model—is presented, which is the joint compensation model for additive and convolutive bias. The novel noise model is estimated on the basis of maximum likelihood manner. In conjunction with the Parallel Model combination (PMC), it is effective for noisy enviro...

متن کامل

Optimal Locating and Sizing of Unified Power Quality Conditioner- phase Angle Control for Reactive Power Compensation in Radial Distribution Network with Wind Generation

In this article, a multi-objective planning is demonstrated for reactive power compensation in radial distribution networks with wind generation via unified power quality conditioner (UPQC). UPQC model, based on phase angle control (PAC), is used. In presented method, optimal locating of UPQC-PAC is done by simultaneous minimizing of objective functions such as: grid power loss, percentage of n...

متن کامل

AN RNN-based compensation method for Mandarin telephone speech recognition

In this paper, a novel architecture, which integrates the recurrent neural network (RNN) based compensation process and the hidden Markov model (HMM) based speech recognition process into a unified framework, is proposed. The RNN is employed to estimate the additive bias, which represents the telephone channel effect, in the cepstral domain. Compensation of telephone channel effects is implemen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Speech & Language

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2009